---
title: Configure environment plugins
description: Configure prediction environment plugins for the MLOps management agent.

---

# Configure management agent environment plugins

Management agent plugins deploy and manage models in a given prediction environment. The management agent submits commands to the plugin, and the plugin executes them and returns the status of the command to the management agent. To facilitate this interaction, you provide prediction environment details during plugin configuration, allowing the plugin to execute commands in that environment. For example, a Kubernetes plugin can launch a deployment (container) in a Kubernetes cluster, replace a model in the deployment, stop the container, etc.

The MLOps management agent contains the following example plugins:

* Filesystem plugin.
* Docker plugin.
* Kubernetes plugin.
* Test plugin.

!!! note
    These example plugins are installed as part of the `datarobot_bosun-*-py3-none-any.whl` wheel file.

## Configure example plugins {: #configure-example-plugins }

The following example plugins require additional configuration for use with the management agent:

=== "Filesystem"

    To enable communication between the management agent and the deployment, the filesystem plugin creates one directory per deployment in the local filesystem, and downloads each deployment's model package and configuration `.yaml` file into the deployment's local directory. These artifacts can then be used to serve predictions from a PPS container.

    ``` yaml title="plugin.filesystem.conf.yaml"
    # The top-level directory that will be used to store each deployment directory
    baseDir: "."

    # Each deployment directory will be prefixed with the following string
    deploymentDirPrefix: "deployment_"

    # The name of the deployment config file to create inside the deployment directory.
    # Note: If working with the PPS, DO NOT change this name; the PPS expects this filename.
    deploymentInfoFile: "config.yml"

    # If defined, this string will be prefixed to the predictions URL for this deployment,
    # and the URL will be returned, with the deployment id suffixed to the end with the
    # /predict endpoint.
    deploymentPredictionBaseUrl: "http://localhost:8080"

    # If defined, create a yaml file with the kv of the deployment.
    # If the name of the file is the same as the deploymentInfoFile,
    # the key values are added to the same file as the other config.
    # deploymentKVFile: "kv.yaml"
    ```

=== "Docker"

    The Docker plugin can deploy native DataRobot models and custom models on a Docker server. In addition, the plugin automatically runs the monitoring agent to monitor deployed models and uses the `traefik` reverse proxy to provide a single prediction endpoint for each deployment.

    The management agent's Docker plugin supports the use of the [Portable Prediction Server](portable-pps), allowing a single Docker container to serve multiple models. It enables you to configure the PPS to indicate where models for each deployment are located and gives you the ability to start, stop, and manage deployments.

    The Docker plugin can:

    * Retrieve a model package from DataRobot for a deployment.
    * Launch the DataRobot model within the Docker container.
    * Shut down and clean up the Docker container.
    * Report status back via events.
    * Monitor predictions using the monitoring agent.

    To configure the Docker plugin, take the following steps:

    1. Set up the environment required for the Docker plugin:

        ``` bash
        docker pull rabbitmq:3-management
        docker pull traefik:2.3.3
        docker network create bosun
        ```
    2. Build the monitoring agent container image:

        ``` bash
        cd datarobot_mlops_package-*/
        cd tools/agent_docker
        make build
        ```
    3. Download the [Portable Prediction Server](portable-pps) from the DataRobot UI. If you are planning to use a custom model image, make sure the image is built and accessible to the Docker service.

    4. Configure the Docker plugin configuration file:

        ```yaml title="plugin.docker.conf.yaml"
        # Docker network on which to run all containers.
        # This network must be created prior to running
        # the agent (i.e., 'docker network create <NAME>`)
        dockerNetwork: "bosun"

        # Traefik image to use
        traefikImage: "traefik:2.3.3"

        # Address that will be reported to DataRobot
        outfacingPredictionURLPrefix: "http://10.10.12.22:81"

        # MLOps Agent image to use for monitoring
        agentImage: "datarobot/mlops-tracking-agent:latest"

        # RabbitMQ image to use for building a channel
        rabbitmqImage: "rabbitmq:3-management"

        # PPS base image
        ppsBaseImage: "datarobot/datarobot-portable-prediction-api:latest"

        # Prefix for generated images
        generatedImagePrefix: "mlops_"

        # Prefix for running containers
        containerNamePrefix: "mlops_"

        # Mapping of traefik proxy ports (not mandatory)
        traefikPortMapping:
            80: 81
            8080: 8081

        # Mapping of RabbitMQ (not mandatory)
        rabbitmqPortMapping:
            15672: 15673
            5672: 5673
        ```

=== "Kubernetes"

    DataRobot provides a plugin to deploy and manage models in your Kubernetes cluster without writing any additional code. For configuration information, see the README file in the `tools/charts/datarobot-management-agent` folder in the tarball.

    ``` yaml title="plugin.k8s.conf.yaml"
    ## The following settings are related to connecting to your Kubernetes cluster
    #
    # The name of the kube-config context to use (similar to --context argument of kubectl). There is a special
    # `IN_CLUSTER` string to be used if you are running the plugin inside a cluster. The default is "IN_CLUSTER"
    # kubeConfigContext: IN_CLUSTER

    # The namespace that you want to create and manage external deployments (similar to --namespace argument of kubectl). You
    # can leave this as `null` to use the "default" namespace, the namespace defined in your context, or (if running `IN_CLUSTER`)
    # manage resources in the same namespace the plugin is executing in.
    # kubeNamespace:

    ## The following settings are related to whether or not MLOps monitoring is enabled
    #
    # We need to know the location of the dockerized agent image that can be launched into your Kubernetes cluster.
    # You can build the image by running `make build` in the tools/agent_docker/ directory and retagging the image
    # and pushing it to your registry.
    # agentImage: "<FILL-IN-DOCKER-REGISTRY>/mlops-tracking-agent:latest"

    ## The following settings are all related to accessing the model from outside the Kubernetes cluster
    #
    # The URL prefix used to access the deployed model, i.e., https://example.com/deployments/
    # The model will be accessible via <outfacingPredictionURLPrefix/<model_id>/predict
    outfacingPredictionURLPrefix: "<FILL-CORRECT-URL-FOR-K8S-INGRESS>"

    # We are still using the beta Ingress resource API, so a class must be provided. If your cluster
    # doesn't have a default ingress class, please provide one.
    # ingressClass:

    ## The following settings are all related to building the finalized model image (base image + mlpkg)
    #
    # The location of the Portable Prediction Server base image. You can download it from DataRobot's developer
    # tools section, retag it, and push it to your registry.
    ppsBaseImage: "<FILL-IN-DOCKER-REGISTRY>/datarobot-portable-prediction-api:latest"

    # The Docker repo to which this plugin can push finalized models. The built images will be tagged
    # as follows: <generatedImageRepo>:m-<model_pkg_id>
    generatedImageRepo: "<FILL-IN-DOCKER-REGISTRY>/mlops-model"

    # We use Kaniko to build our finalized image. See https://github.com/GoogleContainerTools/kaniko#readme.
    # The default is to use the image below.
    # kanikoImage: "gcr.io/kaniko-project/executor:v1.5.2"

    # The name of the Kaniko ConfigMap to use. This provides the settings Kaniko will need to be able to push to
    # your registry type. See https://github.com/GoogleContainerTools/kaniko#pushing-to-different-registries.
    # The default is to not use any additional configuration.
    # kanikoConfigmapName: "docker-config"

    # The name of the Kaniko Secret to use. This provides the settings Kaniko will need to be able to push to
    # your registry type. See https://github.com/GoogleContainerTools/kaniko#pushing-to-different-registries.
    # The default is to not use any additional secrets. The secret must be of the type: kubernetes.io/dockerconfigjson
    # kanikoSecretName: "registry-credentials"

    # The name of a service account to use for running Kaniko if you want to run it in a more secure fashion.
    # See https://github.com/GoogleContainerTools/kaniko#security.
    # The default is to use the "default" service account in the namespace in which the pod runs.
    # kanikoServiceAccount: default
    ```

=== "Test"

    To configure the test plugin, use the `--plugin test` option and set the temporary directory and sleep time (in seconds) for each action executed by the test plugin. For example, the deployment `launch_time_sec` set in the test plugin configuration below creates a temporary file for the deployment, sleeps for 1 second, and then returns.

    ``` yaml title="plugin.test.conf.yaml"
    tmp_dir: "/tmp"
    launch_time_sec: 1
    stop_time_sec: 1
    replace_model_time_sec: 1
    pe_status_time_sec: 1
    deployment_status_time_sec: 1
    deployment_list_time_sec: 1
    plugin_start_time: 1
    plugin_stop_time: 1
    ```

## Create a custom plugin {: #create-a-custom-plugin }

The management agent's plugin framework is flexible enough to accommodate custom plugins. This flexibility is helpful when you have a custom prediction environment (different from, for example, the standard Docker or Kubernetes environment) in which you deploy your models. You can implement a plugin for such a prediction environment either by modifying the existing plugin or by implementing one from scratch. You can use the filesystem plugin as a reference when creating a custom Python plugin.

!!! note
    Currently, custom Java plugins are not supported.

If you decide to write a custom plugin, the following section describes the interface definition provided to write a Python plugin.

### Implement the plugin interface {: #implement-the-plugin-interface }

The management agent Python package defines the [abstract base class](https://docs.python.org/3/library/abc.html) `BosunPluginBase`. Each management agent plugin *must* inherit and implement the interface defined by this base class.

 To start implementing a custom plugin (`SamplePlugin` below), inherit the `BosunPluginBase` base class. As an example, implement the plugin under `sample_plugin` directory in the file `sample_plugin.py`:

``` python
class SamplePlugin(BosunPluginBase):
    def __init__(self, plugin_config, private_config_file=None, pe_info=None, dry_run=False):
```

#### Python plugin arguments {: #python-plugin-arguments }

The constructor is invoked with the following arguments:

Argument | Definition
---------|-----------
`plugin_config`       | A dictionary containing general information about the plugin.  We will go over the details in the following section.
`private_config_file` | Path to the private configuration file for the plugin as passed in by the `--private-config` flag when calling the `bosun-plugin-runner` script. This file is optional and the contents are fully at the discretion of your custom plugin.
`pe_info`             | An instance of `PEInfo`, which contains information about the prediction environment. This parameter is unset for certain actions.
`dry_run`             | The invocation for dry run (development) or the actual run.

#### Python plugin methods {: #python-plugin-methods }

This class implements the following methods:

!!! note
    The return type for each of the following functions must be `ActionStatusInfo`.

``` python
def plugin_start(self):
```

This method initializes the plugin; for example, it can check if the plugin can connect with the prediction environment (e.g., Docker, Kubernetes). In the case of the filesystem plugin, this method checks if the `baseDir` exists on the filesystem. Management agent invokes this method typically only once during the startup process.  This method is guaranteed to be called before any deployment-specific action can be invoked.

<hr>

``` python
def plugin_stop(self):
```

This method implements any tear-down process, for example, close client connections to the prediction environment. The management agent invokes this method typically only once during the shutdown process. This plugin method is guaranteed to be called after all deployment-specific actions are done.

<hr>

``` python
def deployment_list(self):
```

This method returns the list of deployments already running in the given prediction environment. The management agent typically invokes this method during the startup to determine which deployments are already running in the prediction environment.  The list of deployments is returned as a map of `deployment_id` -> Deployment Information, using the `data` field in the `ActionStatusInfo` (described below)

<hr>

``` python
def deployment_start(self, deployment_info):
```

This method implements a deployment launch process.  Management Agent invokes this method when deployment is created or activated in DataRobot. For example, this method can launch the container in the Kubernetes or Docker service. In the case of the filesystem plugin, this method creates a directory with the name `deployment_<deployment_id>`. It then places the deployment's model and a YAML configuration file under the new directory. The plugin should ensure that the deployment in the prediction environment is uniquely identifiable by the deployment id and, ideally, by the paired deployment id and model id.  For example, the built-in Docker plugin launches the container with the following name: `deployment_<deployment_id>_<model-id>`

<hr>

``` python
def deployment_stop(self, deployment_info):
```

This method implements a deployment stop process.  Management Agent invokes this method when deployment is deactivated or deleted in DataRobot. For example, this method can stop the container in the Kubernetes or Docker service. The deployment id and model id from the `deployment_info` uniquely identifies the container that needs to be stopped.  In the case of the filesystem plugin, this method removes the directory created for that deployment by the `deployment_start` method.

<hr>

``` python
def deployment_replace_model(self, deployment_info):
```

This method implements a model replacement process in the deployment. The management agent invokes this method when a model is replaced in a deployment in DataRobot. `modelArtifact` contains the path to the new model, and `newModelId` contains the id of the new model to use for replacement. In the case of the Docker or Kubernetes plugin, a potential implementation of this method could stop the container with the old model id and then start a new container with the new model. In the case of filesystem plugin, it removes the old deployment directory and creates a new one with the new model.

<hr>

``` python
def pe_status(self):
```

This method queries for the status of the prediction environment, for example, whether the Kubernetes or Docker service is still reachable. The management agent periodically invokes this method to ensure the prediction environment is in a good state. In order to improve the experience, the plugin can support queries for the status of the deployments running in the prediction environment in addition to the status of the prediction environment itself. In this case, the IDs of the deployments are included in the `deployments` field of the `peInfo` structure (described below), and the status of each deployment is returned using `data` field in the `ActionStatusInfo` object (described below). The deployment status is returned as a map of `deployment_id` to Deployment Information.

<hr>

``` python
def deployment_status(self):
```

This method queries the status of the deployment deployed in a prediction environment, for example, whether the container corresponding to the deployment is still up and running.  The management agent periodically invokes this method to ensure that the deployment is in a good state.

<hr>

``` python
def deployment_relaunch(self, deployment_info):
```

This method implements the process of relaunching (stopping + starting) the deployment. The management agent Python package already provides a **default implementation** of this method by invoking `deployment_stop` followed by `deployment_start`; however, the plugin can implement its own relaunch mechanism if there is an optimal way to relaunch a deployment.

<hr>

#### Python plugin return value {: #python-plugin-return-value }

The return value for all these operations is an `ActionStatusInfo` object providing the status of the action:

```python
class ActionStatusInfo:
    def __init__(self, status, msg=None, state=None, duration=None, data=None):
```

This object contains the following fields:

Field | Definition
------|-----------
`status`   | Indicates the status of the action. <br> **Values**: `ActionStatus.OK`, `ActionStatus.WARN`, `ActionStatus.ERROR`, and `ActionStatus.UNKNOWN`
`msg`      | Returns a `string` type message that the plugin can forward to the management agent, which in turn, will forward the message to the MLOps service (DataRobot).
`state`    | Indicates the state of the deployment after the execution of action. <br> **Values**: `ready`, `stopped`, and `errored`.
`duration` | Indicates the time the action took to execute.
`data`     | Returns information that plugin can forward to the management agent. Currently, `deployment_list` method uses this field to list the deployments in the form of a dictionary of `deployment_id` to Deployment Information. This field can also be used by the `pe_status` method to report the status of deployments running in the prediction environment in addition to the prediction environment status.

!!! note
    The base class automatically adds the `timestamp` to the object to keep track of different action status values.

### Use the bosun-plugin-runner {: #configure-example-plugins }

The management agent Python package provides the `bosun-plugin-runner` CLI tool, which allows you to invoke the custom plugin class and run a specific action. Using this tool, you can run your plugin in standalone mode while developing and debugging your plugin.

For example:

``` shell
bosun-plugin-runner \
    --plugin sample_plugin/sample_plugin \
    --action pe_status \
    --config sample_configs/action_config_pe_status_only.yaml \
    --private-config sample_configs/sample_plugin_config.yaml \
    --status-file /tmp/status.yaml \
    --show-status
```

The `bosun-plugin-runner` accepts the following arguments:

Argument | Definition
---------|-----------
`--plugin`         | Specifies the module containing the plugin class. In this case, we used sample_plugin/sample_plugin since the plugin class is inside the sample_plugin directory in the sample_plugin.py file.
`--action`         | Specifies the action to run. Here we use the `pe_status` action. Other supported actions are listed below.
`--config`         | Provides the configuration file to use for the action specified. We describe this in more detail in the next section.  When your plugin runs as part of the Management agent service, this file will be generated for you but when testing specific actions manually via the `bosun-plugin-runner` you will have to generate the configuration file yourself.
`--private-config` | Provides a plugin specific configuration file used only by plugin.
`--status-file`    | Provides a path for saving the plugin status that results from the action.
`--show-status`    | Shows the contents of the `--status-file` on stdout.

To view the list of actions supported by `bosun-plugin-runner` use the `--list-actions` option:

``` shell
bosun-plugin-runner --list-actions
# plugin_start
# plugin_stop
# deployment_start
# deployment_stop
# deployment_replace_model
# deployment_status
# pe_status
# deployment_list
```

### Create the action config file {: #create-the-action-config-file }

The `--config` flag is used to pass a YAML configuration file to the plugin. This is the structure of the configuration that the management agent prepares and invokes the plugin action with; however, during plugin development, you may need to write this configuration file yourself.

The typical contents of such a config file are shown below:

``` yaml
pluginConfig:
  name: "ExternalCommand-1"
  type: "ExternalCommand"
  platform: "os"
  commandPrefix: "python3 sample_plugin.py"
  mlopsUrl: "https://app.datarobot.com"

peInfo:
   id: "0x2345"
   name: "Sample-PE"
   description: "some description"
   createdOn: "iso formatted date"
   createdBy: "some username"
   deployments: ["deployment-1", "deployment-2"]
   keyValueConfig:
    max_models: 5

deploymentInfo:
  id: "deployment-1"
  name: "deployment-1"
  description: "Deployment 1 for testing"
  modelId: "model-A"
  modelArtifact: "/tmp/model-A.txt"
  modelExecutionType: "dedicated"
  keyValueConfig:
    key1: "some-value-for-key-1"

```

The action configuration file contains three sections: `pluginConfig`, `peInfo`, and `deploymentInfo`.

The `pluginConfig` section contains general information about the plugin, for example, ID of the prediction environment, its type, and the platform. It may also contain the `mlopsUrl`, the address of the MLOps service (DataRobot) (in case the plugin would like to connect). This is the section that translates to the `pluginConfig` dictionary and is passed as a constructor argument.

The `peInfo` section contains information about the prediction environment this action refers to. Typically, this information is used for `pe_status` action.  If `deployments` key contains valid deployment ids, the plugin is expected to return not only the status of the prediction environment but also the status of the deployments listed under `deployments`.

The `deploymentInfo` section contains the information about the deployment in the prediction environment this action refers to.  All the deployment-related actions use this section to identify which deployment and model to work on.  As this is a particularly important section of the config, let us go over some of the important fields:

* `id`, `name`, and `description`: Provides information about the deployment as set in DataRobot.

* `modelId`, `modelArtifact`: Indicates the ID of the model and the path where the model can be found. Note that the management agent will place the right model at this path before invoking `deployment_start` or `deployment_replace_model`.

* `keyValueConfig`: Lists the additional configuration for the deployment.  Note that this additional config can be set on the deployment in DataRobot. For example, this can be used to specify how much memory the container corresponding to this deployment should use.


### Run actions with bosun-plugin-runner {: #run-actions-with-bosun-plugin-runner }

As covered above, during plugin development, you can use the `bosun-plugin-runner` to invoke the actions. For example, here is how a `deployment_start` action can be invoked.  We will use the same config as described in the previous section and dump it to a file `sample_configs/config_deployment-1_model-A.yaml` file.

``` shell
bosun-plugin-runner \
    --plugin sample_plugin/sample_plugin \
    --config sample_configs/action_config_deployment_1_model_A.yaml \
    --private-config sample_configs/sample_plugin_config.yaml \
    --action deployment_start \
    --status-file /tmp/status.yaml \
    --show-status
```

The status of this `deployment_start` action is captured in the file `/tmp/status.yaml`

### Configure the command prefix {: #configure-the-command-prefix }

Now that your plugin is ready for the management agent, you can configure the `command` prefix in the management agent configuration file as:

```yaml
    command: "<BOSUN_VENV_PATH>/bin/bosun-plugin-runner --plugin sample_plugin --private-config <CONF_PATH>/plugin.sample_plugin_.conf.yaml"
```
You will need to install the sample plugin in the same virtual environment as the management agent Python package.  Ensure the private configuration file path for the plugin is set correctly.
